Overview

Dataset statistics

Number of variables20
Number of observations338592
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory51.7 MiB
Average record size in memory160.0 B

Variable types

Numeric15
Categorical5

Warnings

Fukusyoku has a high cardinality: 2420 distinct values High cardinality
Wakuban is highly correlated with UmabanHigh correlation
Umaban is highly correlated with WakubanHigh correlation
FutanBefore is highly skewed (γ1 = 35.36091292) Skewed
UmaKigoCD has 306643 (90.6%) zeros Zeros
FutanBefore has 338321 (99.9%) zeros Zeros

Reproduction

Analysis started2021-04-07 12:46:34.321370
Analysis finished2021-04-07 12:48:19.798108
Duration1 minute and 45.48 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

JyoCD
Real number (ℝ≥0)

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.034351077
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile4
Q16
median7
Q39
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.957488196
Coefficient of variation (CV)0.2782755899
Kurtosis-0.6791228629
Mean7.034351077
Median Absolute Deviation (MAD)2
Skewness-0.2821718823
Sum2381775
Variance3.831760037
MonotocityNot monotonic
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
862582
18.5%
661963
18.3%
960683
17.9%
556260
16.6%
1034994
10.3%
733726
10.0%
415333
 
4.5%
39541
 
2.8%
22407
 
0.7%
11103
 
0.3%
ValueCountFrequency (%)
11103
 
0.3%
22407
 
0.7%
39541
 
2.8%
415333
 
4.5%
556260
16.6%
ValueCountFrequency (%)
1034994
10.3%
960683
17.9%
862582
18.5%
733726
10.0%
661963
18.3%

Kaiji
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.837804792
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile5
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.380581853
Coefficient of variation (CV)0.4864964135
Kurtosis-1.031254126
Mean2.837804792
Median Absolute Deviation (MAD)1
Skewness0.2717196575
Sum960858
Variance1.906006254
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
286421
25.5%
371814
21.2%
168827
20.3%
457965
17.1%
549503
14.6%
64062
 
1.2%
ValueCountFrequency (%)
168827
20.3%
286421
25.5%
371814
21.2%
457965
17.1%
549503
14.6%
ValueCountFrequency (%)
64062
 
1.2%
549503
14.6%
457965
17.1%
371814
21.2%
286421
25.5%

Nichiji
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.837054626
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q37
95-th percentile9
Maximum12
Range11
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.648915752
Coefficient of variation (CV)0.5476299023
Kurtosis-0.5391884375
Mean4.837054626
Median Absolute Deviation (MAD)2
Skewness0.3627786178
Sum1637788
Variance7.016754662
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
441100
12.1%
340690
12.0%
640531
12.0%
540228
11.9%
240172
11.9%
139609
11.7%
735734
10.6%
834547
10.2%
912504
 
3.7%
104688
 
1.4%
Other values (2)8789
 
2.6%
ValueCountFrequency (%)
139609
11.7%
240172
11.9%
340690
12.0%
441100
12.1%
540228
11.9%
ValueCountFrequency (%)
124430
 
1.3%
114359
 
1.3%
104688
 
1.4%
912504
 
3.7%
834547
10.2%

RaceNum
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.460512948
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.492256145
Coefficient of variation (CV)0.5405540046
Kurtosis-1.238932552
Mean6.460512948
Median Absolute Deviation (MAD)3
Skewness0.03188794657
Sum2187478
Variance12.19585299
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1229742
8.8%
329682
8.8%
1129422
8.7%
229073
8.6%
129042
8.6%
428790
8.5%
728069
8.3%
827900
8.2%
627684
8.2%
527483
8.1%
Other values (2)51705
15.3%
ValueCountFrequency (%)
129042
8.6%
229073
8.6%
329682
8.8%
428790
8.5%
527483
8.1%
ValueCountFrequency (%)
1229742
8.8%
1129422
8.7%
1026991
8.0%
924714
7.3%
827900
8.2%

Wakuban
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.736183962
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q37
95-th percentile8
Maximum8
Range7
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.276567901
Coefficient of variation (CV)0.4806755649
Kurtosis-1.212228567
Mean4.736183962
Median Absolute Deviation (MAD)2
Skewness-0.1248299126
Sum1603634
Variance5.18276141
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
848743
14.4%
747647
14.1%
644617
13.2%
543143
12.7%
441576
12.3%
340048
11.8%
237478
11.1%
135340
10.4%
ValueCountFrequency (%)
135340
10.4%
237478
11.1%
340048
11.8%
441576
12.3%
543143
12.7%
ValueCountFrequency (%)
848743
14.4%
747647
14.1%
644617
13.2%
543143
12.7%
441576
12.3%

Umaban
Real number (ℝ≥0)

HIGH CORRELATION

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.937290308
Minimum1
Maximum18
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median8
Q312
95-th percentile15
Maximum18
Range17
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.489699079
Coefficient of variation (CV)0.5656463232
Kurtosis-1.020836087
Mean7.937290308
Median Absolute Deviation (MAD)4
Skewness0.1861529636
Sum2687503
Variance20.15739782
MonotocityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
523545
 
7.0%
623534
 
7.0%
723520
 
6.9%
323503
 
6.9%
223501
 
6.9%
123483
 
6.9%
423407
 
6.9%
823247
 
6.9%
922814
 
6.7%
1022136
 
6.5%
Other values (8)105902
31.3%
ValueCountFrequency (%)
123483
6.9%
223501
6.9%
323503
6.9%
423407
6.9%
523545
7.0%
ValueCountFrequency (%)
182208
 
0.7%
172596
 
0.8%
1611830
3.5%
1514140
4.2%
1416435
4.9%

KettoNum
Real number (ℝ≥0)

Distinct54321
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011682419
Minimum2000100030
Maximum2018110145
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum2000100030
5-th percentile2006102736
Q12009101517
median2012101207
Q32014110115
95-th percentile2017104470
Maximum2018110145
Range18010115
Interquartile range (IQR)5008598

Descriptive statistics

Standard deviation3560527.423
Coefficient of variation (CV)0.001769925208
Kurtosis-0.892800638
Mean2011682419
Median Absolute Deviation (MAD)2999216
Skewness-0.1315503949
Sum6.811395735 × 1014
Variance1.267735553 × 1013
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201020000771
 
< 0.1%
201110062570
 
< 0.1%
201010116168
 
< 0.1%
201110154261
 
< 0.1%
201010132659
 
< 0.1%
201010240257
 
< 0.1%
201010516754
 
< 0.1%
200710562954
 
< 0.1%
201410652954
 
< 0.1%
201310647453
 
< 0.1%
Other values (54311)337991
99.8%
ValueCountFrequency (%)
20001000304
< 0.1%
20001002316
< 0.1%
20001007851
 
< 0.1%
20001009654
< 0.1%
20001014701
 
< 0.1%
ValueCountFrequency (%)
20181101452
< 0.1%
20181101392
< 0.1%
20181101382
< 0.1%
20181101361
< 0.1%
20181101352
< 0.1%

UmaKigoCD
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5410080569
Minimum0
Maximum26
Zeros306643
Zeros (%)90.6%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5
Maximum26
Range26
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.834139141
Coefficient of variation (CV)3.390225188
Kurtosis37.08666405
Mean0.5410080569
Median Absolute Deviation (MAD)0
Skewness4.809565025
Sum183181
Variance3.36406639
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0306643
90.6%
521025
 
6.2%
69760
 
2.9%
21544
 
0.2%
11536
 
0.2%
2682
 
< 0.1%
222
 
< 0.1%
ValueCountFrequency (%)
0306643
90.6%
521025
 
6.2%
69760
 
2.9%
11536
 
0.2%
21544
 
0.2%
ValueCountFrequency (%)
2682
 
< 0.1%
222
 
< 0.1%
21544
 
0.2%
11536
 
0.2%
69760
2.9%

SexCD
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
1
190044 
2
135746 
3
 
12802

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row3
ValueCountFrequency (%)
1190044
56.1%
2135746
40.1%
312802
 
3.8%
Histogram of lengths of the category
ValueCountFrequency (%)
1190044
56.1%
2135746
40.1%
312802
 
3.8%

Most occurring characters

ValueCountFrequency (%)
1190044
56.1%
2135746
40.1%
312802
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
1190044
56.1%
2135746
40.1%
312802
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
1190044
56.1%
2135746
40.1%
312802
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
1190044
56.1%
2135746
40.1%
312802
 
3.8%

HinsyuCD
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
1
338476 
2
 
116

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
1338476
> 99.9%
2116
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
1338476
> 99.9%
2116
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1338476
> 99.9%
2116
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
1338476
> 99.9%
2116
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
1338476
> 99.9%
2116
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
1338476
> 99.9%
2116
 
< 0.1%

KeiroCD
Real number (ℝ≥0)

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.134846659
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median3
Q34
95-th percentile7
Maximum11
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.568562341
Coefficient of variation (CV)0.5003633387
Kurtosis0.4628191168
Mean3.134846659
Median Absolute Deviation (MAD)1
Skewness0.5386208545
Sum1061434
Variance2.460387817
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
3143648
42.4%
179622
23.5%
466048
19.5%
524108
 
7.1%
719027
 
5.6%
64850
 
1.4%
21148
 
0.3%
11141
 
< 0.1%
ValueCountFrequency (%)
179622
23.5%
21148
 
0.3%
3143648
42.4%
466048
19.5%
524108
 
7.1%
ValueCountFrequency (%)
11141
 
< 0.1%
719027
 
5.6%
64850
 
1.4%
524108
 
7.1%
466048
19.5%

Barei
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.681864899
Minimum2
Maximum13
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum2
5-th percentile2
Q13
median3
Q34
95-th percentile6
Maximum13
Range11
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.401573669
Coefficient of variation (CV)0.3806694997
Kurtosis0.9937131313
Mean3.681864899
Median Absolute Deviation (MAD)1
Skewness1.061768409
Sum1246650
Variance1.96440875
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
3135641
40.1%
460264
17.8%
258656
17.3%
544862
 
13.2%
623815
 
7.0%
710280
 
3.0%
83807
 
1.1%
9996
 
0.3%
10214
 
0.1%
1147
 
< 0.1%
Other values (2)10
 
< 0.1%
ValueCountFrequency (%)
258656
17.3%
3135641
40.1%
460264
17.8%
544862
 
13.2%
623815
 
7.0%
ValueCountFrequency (%)
132
 
< 0.1%
128
 
< 0.1%
1147
 
< 0.1%
10214
 
0.1%
9996
0.3%

TozaiCD
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
2
188030 
1
149934 
3
 
546
4
 
82

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2
ValueCountFrequency (%)
2188030
55.5%
1149934
44.3%
3546
 
0.2%
482
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
2188030
55.5%
1149934
44.3%
3546
 
0.2%
482
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
2188030
55.5%
1149934
44.3%
3546
 
0.2%
482
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
2188030
55.5%
1149934
44.3%
3546
 
0.2%
482
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
2188030
55.5%
1149934
44.3%
3546
 
0.2%
482
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
2188030
55.5%
1149934
44.3%
3546
 
0.2%
482
 
< 0.1%

ChokyosiCode
Real number (ℝ≥0)

Distinct502
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean911.6591975
Minimum106
Maximum5750
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum106
5-th percentile359
Q1438
median1055
Q31101
95-th percentile1148
Maximum5750
Range5644
Interquartile range (IQR)663

Descriptive statistics

Standard deviation360.6743557
Coefficient of variation (CV)0.3956241068
Kurtosis44.23116969
Mean911.6591975
Median Absolute Deviation (MAD)50
Skewness2.883744135
Sum308680511
Variance130085.9909
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10752928
 
0.9%
10992891
 
0.9%
10282879
 
0.9%
10392588
 
0.8%
11102577
 
0.8%
10022576
 
0.8%
4112531
 
0.7%
10462476
 
0.7%
10222405
 
0.7%
4382401
 
0.7%
Other values (492)312340
92.2%
ValueCountFrequency (%)
106157
 
< 0.1%
11068
 
< 0.1%
119198
 
0.1%
131458
0.1%
138515
0.2%
ValueCountFrequency (%)
57501
< 0.1%
57491
< 0.1%
57481
< 0.1%
57471
< 0.1%
57461
< 0.1%

BanusiCode
Real number (ℝ≥0)

Distinct2663
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean492980.3969
Minimum31
Maximum999031
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum31
5-th percentile32800
Q1261007
median495005
Q3754030
95-th percentile948006
Maximum999031
Range999000
Interquartile range (IQR)493023

Descriptive statistics

Standard deviation286532.2733
Coefficient of variation (CV)0.5812244768
Kurtosis-1.114891849
Mean492980.3969
Median Absolute Deviation (MAD)242795
Skewness-0.005857131869
Sum1.669192186 × 1011
Variance8.210074361 × 1010
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5230057099
 
2.1%
2268006994
 
2.1%
4158006975
 
2.1%
4868006380
 
1.9%
5468006313
 
1.9%
5068004753
 
1.4%
30603802
 
1.1%
6740043727
 
1.1%
7168003607
 
1.1%
5478003053
 
0.9%
Other values (2653)285889
84.4%
ValueCountFrequency (%)
318
 
< 0.1%
8032
 
< 0.1%
103136
 
< 0.1%
1060130
< 0.1%
17004
 
< 0.1%
ValueCountFrequency (%)
9990311
 
< 0.1%
99803310
 
< 0.1%
9980301
 
< 0.1%
99800929
 
< 0.1%
998007105
< 0.1%

Fukusyoku
Categorical

HIGH CARDINALITY

Distinct2420
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
青,桃襷,桃袖
 
7099
黒,赤十字襷,袖黄縦縞
 
6994
黄,黒縦縞,袖青一本輪
 
6975
緑,白二本輪,白袖赤一本輪
 
6380
赤,緑袖赤一本輪
 
6313
Other values (2415)
304831 

Length

Max length28
Median length10
Mean length9.911332814
Min length1

Characters and Unicode

Total characters3355898
Distinct characters117
Distinct categories4 ?
Distinct scripts5 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique131 ?
Unique (%)< 0.1%

Sample

1st row鼠,海老襷,海老袖鼠一本輪
2nd row鼠,海老襷,海老袖鼠一本輪
3rd row鼠,海老襷,海老袖鼠一本輪
4th row赤,緑元禄,赤袖
5th row白,水色玉霰,紫袖
ValueCountFrequency (%)
青,桃襷,桃袖7099
 
2.1%
黒,赤十字襷,袖黄縦縞6994
 
2.1%
黄,黒縦縞,袖青一本輪6975
 
2.1%
緑,白二本輪,白袖赤一本輪6380
 
1.9%
赤,緑袖赤一本輪6313
 
1.9%
水色,赤玉霰,袖赤一本輪4753
 
1.4%
緑,赤星散,赤袖3727
 
1.1%
赤,緑格子,赤袖3607
 
1.1%
緑,赤襷,赤袖白一本輪3053
 
0.9%
桃,緑一本輪,袖黄縦縞3017
 
0.9%
Other values (2410)286674
84.7%
Histogram of lengths of the category
ValueCountFrequency (%)
青,桃襷,桃袖7099
 
2.1%
黒,赤十字襷,袖黄縦縞6994
 
2.1%
黄,黒縦縞,袖青一本輪6975
 
2.1%
緑,白二本輪,白袖赤一本輪6380
 
1.9%
赤,緑袖赤一本輪6313
 
1.9%
水色,赤玉霰,袖赤一本輪4753
 
1.4%
緑,赤星散,赤袖3727
 
1.1%
赤,緑格子,赤袖3607
 
1.1%
緑,赤襷,赤袖白一本輪3053
 
0.9%
桃,緑一本輪,袖黄縦縞3017
 
0.9%
Other values (2410)286674
84.7%

Most occurring characters

ValueCountFrequency (%)
608672
18.1%
306604
 
9.1%
213903
 
6.4%
213899
 
6.4%
204107
 
6.1%
172120
 
5.1%
152668
 
4.5%
138351
 
4.1%
112619
 
3.4%
106614
 
3.2%
Other values (107)1126341
33.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter2747064
81.9%
Other Punctuation608809
 
18.1%
Uppercase Letter13
 
< 0.1%
Modifier Letter12
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
306604
 
11.2%
213903
 
7.8%
213899
 
7.8%
204107
 
7.4%
172120
 
6.3%
152668
 
5.6%
138351
 
5.0%
112619
 
4.1%
106614
 
3.9%
92650
 
3.4%
Other values (101)1033529
37.6%
ValueCountFrequency (%)
9
69.2%
3
 
23.1%
1
 
7.7%
ValueCountFrequency (%)
608672
> 99.9%
137
 
< 0.1%
ValueCountFrequency (%)
12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Han2675090
79.7%
Common608821
 
18.1%
Katakana68527
 
2.0%
Hiragana3447
 
0.1%
Latin13
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
306604
 
11.5%
213903
 
8.0%
213899
 
8.0%
204107
 
7.6%
172120
 
6.4%
152668
 
5.7%
138351
 
5.2%
112619
 
4.2%
106614
 
4.0%
92650
 
3.5%
Other values (66)961555
35.9%
ValueCountFrequency (%)
11343
16.6%
11343
16.6%
11341
16.5%
11340
16.5%
11339
16.5%
11339
16.5%
389
 
0.6%
10
 
< 0.1%
10
 
< 0.1%
9
 
< 0.1%
Other values (22)64
 
0.1%
ValueCountFrequency (%)
608672
> 99.9%
137
 
< 0.1%
12
 
< 0.1%
ValueCountFrequency (%)
1149
33.3%
1149
33.3%
1149
33.3%
ValueCountFrequency (%)
9
69.2%
3
 
23.1%
1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
CJK2675090
79.7%
None608685
 
18.1%
Katakana68676
 
2.0%
Hiragana3447
 
0.1%

Most frequent character per block

ValueCountFrequency (%)
306604
 
11.5%
213903
 
8.0%
213899
 
8.0%
204107
 
7.6%
172120
 
6.4%
152668
 
5.7%
138351
 
5.2%
112619
 
4.2%
106614
 
4.0%
92650
 
3.5%
Other values (66)961555
35.9%
ValueCountFrequency (%)
608672
> 99.9%
9
 
< 0.1%
3
 
< 0.1%
1
 
< 0.1%
ValueCountFrequency (%)
11343
16.5%
11343
16.5%
11341
16.5%
11340
16.5%
11339
16.5%
11339
16.5%
389
 
0.6%
137
 
0.2%
12
 
< 0.1%
10
 
< 0.1%
Other values (24)83
 
0.1%
ValueCountFrequency (%)
1149
33.3%
1149
33.3%
1149
33.3%

Futan
Real number (ℝ≥0)

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean550.602687
Minimum470
Maximum650
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum470
5-th percentile520
Q1540
median550
Q3560
95-th percentile570
Maximum650
Range180
Interquartile range (IQR)20

Descriptive statistics

Standard deviation17.67836287
Coefficient of variation (CV)0.03210729495
Kurtosis0.9997653313
Mean550.602687
Median Absolute Deviation (MAD)10
Skewness0.1277832991
Sum186429665
Variance312.5245139
MonotocityNot monotonic
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
54094120
27.8%
56065643
19.4%
55064787
19.1%
57057590
17.0%
53018451
 
5.4%
52013381
 
4.0%
51010128
 
3.0%
6007519
 
2.2%
5803082
 
0.9%
5901438
 
0.4%
Other values (17)2453
 
0.7%
ValueCountFrequency (%)
4701
 
< 0.1%
48028
 
< 0.1%
4851
 
< 0.1%
490401
0.1%
500891
0.3%
ValueCountFrequency (%)
6501
 
< 0.1%
6402
 
< 0.1%
63522
 
< 0.1%
630210
0.1%
620177
0.1%

FutanBefore
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4294549192
Minimum0
Maximum580
Zeros338321
Zeros (%)99.9%
Memory size2.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum580
Range580
Interquartile range (IQR)0

Descriptive statistics

Standard deviation15.18197623
Coefficient of variation (CV)35.35173437
Kurtosis1249.719666
Mean0.4294549192
Median Absolute Deviation (MAD)0
Skewness35.36091292
Sum145410
Variance230.4924022
MonotocityNot monotonic
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
0338321
99.9%
54064
 
< 0.1%
53049
 
< 0.1%
52042
 
< 0.1%
55041
 
< 0.1%
51029
 
< 0.1%
56028
 
< 0.1%
57014
 
< 0.1%
4902
 
< 0.1%
4801
 
< 0.1%
ValueCountFrequency (%)
0338321
99.9%
4801
 
< 0.1%
4902
 
< 0.1%
51029
 
< 0.1%
52042
 
< 0.1%
ValueCountFrequency (%)
5801
 
< 0.1%
57014
 
< 0.1%
56028
< 0.1%
55041
< 0.1%
54064
< 0.1%

Blinker
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
0
313451 
1
 
25141

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters338592
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0313451
92.6%
125141
 
7.4%
Histogram of lengths of the category
ValueCountFrequency (%)
0313451
92.6%
125141
 
7.4%

Most occurring characters

ValueCountFrequency (%)
0313451
92.6%
125141
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number338592
100.0%

Most frequent character per category

ValueCountFrequency (%)
0313451
92.6%
125141
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
Common338592
100.0%

Most frequent character per script

ValueCountFrequency (%)
0313451
92.6%
125141
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII338592
100.0%

Most frequent character per block

ValueCountFrequency (%)
0313451
92.6%
125141
 
7.4%

KisyuCode
Real number (ℝ≥0)

Distinct424
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1414.394085
Minimum405
Maximum5598
Zeros0
Zeros (%)0.0%
Memory size2.6 MiB

Quantile statistics

Minimum405
5-th percentile660
Q11018
median1095
Q31141
95-th percentile5328
Maximum5598
Range5193
Interquartile range (IQR)123

Descriptive statistics

Standard deviation1268.504458
Coefficient of variation (CV)0.8968536218
Kurtosis5.515425387
Mean1414.394085
Median Absolute Deviation (MAD)62
Skewness2.696622159
Sum478902522
Variance1609103.559
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7327973
 
2.4%
10187608
 
2.2%
11266632
 
2.0%
10145684
 
1.7%
11155567
 
1.6%
10755398
 
1.6%
52035355
 
1.6%
4225282
 
1.6%
10885154
 
1.5%
11025148
 
1.5%
Other values (414)278791
82.3%
ValueCountFrequency (%)
4059
 
< 0.1%
40942
 
< 0.1%
4106
 
< 0.1%
4225282
1.6%
4233
 
< 0.1%
ValueCountFrequency (%)
55981
 
< 0.1%
55973
 
< 0.1%
5596145
< 0.1%
55892
 
< 0.1%
55881
 
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

JyoCDKaijiNichijiRaceNumWakubanUmabanKettoNumUmaKigoCDSexCDHinsyuCDKeiroCDBareiTozaiCDChokyosiCodeBanusiCodeFukusyokuFutanFutanBeforeBlinkerKisyuCode
011791120061041510115421021730800鼠,海老襷,海老袖鼠一本輪570001032
11231181320061041510115421021730800鼠,海老襷,海老袖鼠一本輪570001126
2222117720061041510115421021730800鼠,海老襷,海老袖鼠一本輪570001032
37261161220061041510115421021730800赤,緑元禄,赤袖550001030
411793520061050200311421044572800白,水色玉霰,紫袖56000705
5212881620061050200311421044572800白,水色玉霰,紫袖570001074
621384620061050200311421044572800白,水色玉霰,紫袖570001074
7311681620061050200311421044572800白,水色玉霰,紫袖570001074
8316111120061050200311421044572800白,水色玉霰,紫袖560001074
972474820061050200311421044572800白,水色玉霰,紫袖570001074

Last rows

JyoCDKaijiNichijiRaceNumWakubanUmabanKettoNumUmaKigoCDSexCDHinsyuCDKeiroCDBareiTozaiCDChokyosiCodeBanusiCodeFukusyokuFutanFutanBeforeBlinkerKisyuCode
3385821025948201810912221215235257930031薄紫,青四ツ割,袖青一本輪550005456
3385831025947201810916921211235661908031白,青四ツ割,袖青一本輪550005478
3385841026281720171062530213321049789006桃,青襷,青袖540001157
3385851026661120171048300213321062318803緑,黄十字襷,黒袖510001182
338586102671120171029230114321087346803桃,赤十字襷,白袖赤縦縞540001018
3385871026855201710151921211335106933031白,黄四ツ割,袖黄一本輪520005587
3385881026116820151100476115521107764001黄,水色襷570001154
3385891028261220181021160115221039788800水色,赤十字襷,赤袖540001034
3385901028471120171054970313321159634033白,緑星散560001018
3385911028581620181066210113221136523005青,桃襷,桃袖540001037